Skip to content

Make executor mcp ensure a durable daemon and bridge to it#1196

Open
RhysSullivan wants to merge 1 commit into
mainfrom
phase1-mcp-daemon
Open

Make executor mcp ensure a durable daemon and bridge to it#1196
RhysSullivan wants to merge 1 commit into
mainfrom
phase1-mcp-daemon

Conversation

@RhysSullivan

Copy link
Copy Markdown
Owner

What

executor mcp no longer owns the local database. It ensures a durable, detached daemon and bridges stdio JSON-RPC to it over HTTP. Concurrent cold starts run a race-safe election: exactly one process becomes the owner, and the rest wait for its manifest and attach rather than failing. The owner's lifetime is independent of any MCP client, so multiple MCP clients, the web UI, and the desktop app all share one local server.

Supersedes #1033, which had the first executor mcp process start a server in-process and bridge to itself. That tied the shared owner's lifetime to a transient client: when it exited, the server everyone else attached to went down. This builds on the merged start-lock primitive instead, so no client ever owns the database.

Tests

  • e2e/local/cli-mcp-daemon-attach-stress.test.ts: cold-start race, attach storm, and kill-under-load against the local dev server.
  • e2e/cli/election-cold-start.test.ts: fires N simultaneous clients at one cold data dir on the cli VM targets and asserts exactly one daemon is elected and every client attaches.
  • e2e/cli/election-cold-start.win.ps1: the same election proven on real Windows.

Verified the one-winner, rest-attach behavior on macOS, Linux, and Windows:

cold  ok=6 n=6 spawned=1 manifests=1 health=200
warm  ok=6 n=6 spawned=0 manifests=1 health=200

(cold: one client spawns the daemon, the other five attach, all round-trip; warm: all six attach, none spawn.)

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing 37ff13e Commit Preview URL

Branch Preview URL
Jun 29 2026, 04:28 AM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud 37ff13e Jun 29 2026, 04:29 AM

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Console https://executor-preview-pr-1196.executor-e2e.workers.dev
MCP https://executor-preview-pr-1196.executor-e2e.workers.dev/mcp
Deployed commit 37ff13e

Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes.

@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown

Greptile Summary

This PR replaces the in-process MCP server approach with a pure stdio-to-HTTP bridge: executor mcp no longer owns the database. It ensures a durable detached daemon (via a race-safe election built on a file-system lock) and forwards JSON-RPC between the MCP client's stdio and the daemon's /mcp endpoint over Streamable HTTP.

  • Election loop (spawnAndWaitForDaemon): acquires a file-system start-lock; losers poll for the winner's manifest rather than failing, and all three attempts can reclaim a stale lock if the prior holder died before writing the manifest.
  • runMcpHttpBridge: manages bidirectional JSON-RPC forwarding, teardown on SIGINT/SIGTERM/stdin-close/transport-close, and graceful cleanup via Promise.allSettled to avoid hanging on close errors.
  • Tests: three new suites cover the attach storm, cold-start race, and kill-under-load scenarios on macOS, Linux, and Windows.

Confidence Score: 5/5

The production bridge and election code is well-structured — lock acquisition, daemon spawn, manifest polling, and teardown are all correctly guarded.

The core logic in main.ts is solid: the lock holder always releases via Effect.ensuring, waitForDaemonStartupTarget polls by manifest (not just URL) so a port shift doesn't strand losers, and the bridge shutdown path is idempotent at every layer. No data-loss, crash, or security concern was found in the changed production paths.

e2e/local/cli-mcp-daemon-attach-stress.test.ts — the readFileSync missing-import bug (already flagged in a previous thread) leaves stopAutoSpawnedDaemon silently unable to kill the auto-spawned daemon, so orphan processes can accumulate across cold-start race test runs.

Important Files Changed

Filename Overview
apps/cli/src/main.ts Core refactor: removes in-process MCP server, adds election loop (spawnAndWaitForDaemon) and HTTP bridge (runMcpHttpBridge); logic is well-guarded with idempotent finish/close helpers.
e2e/local/cli-mcp-daemon-attach-stress.test.ts New stress scenarios for attach storm, cold-start race, and kill-under-load; contains the previously flagged readFileSync missing import in stopAutoSpawnedDaemon; cold-start race test does not assert spawned===1, so a split-brain spawn would pass undetected.
e2e/cli/election-cold-start.test.ts Cross-OS election proof driven over SSH; correctly asserts manifests===1 and all clients succeed; spawned count is computed but not asserted.
e2e/cli/election-cold-start.win.ps1 Windows companion script proving the election on real Windows; success is detected via stdout content because ExitCode is unreliable under -NoNewWindow on PS 5.1 (documented in comment).
apps/cli/package.json Adds @modelcontextprotocol/sdk ^1.29.0 dependency for StdioServerTransport and StreamableHTTPClientTransport.
bun.lock Lock file updated to reflect the new @modelcontextprotocol/sdk dependency.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant C1 as executor mcp (client 1)
    participant C2 as executor mcp (client 2)
    participant FS as Filesystem (start-lock + manifest)
    participant D as Daemon process

    C1->>FS: readActiveLocalServerManifest()
    C2->>FS: readActiveLocalServerManifest()
    FS-->>C1: null (cold start)
    FS-->>C2: null (cold start)

    C1->>FS: "acquireDaemonStartLock() - held=true"
    C2->>FS: acquireDaemonStartLock() - contention error

    Note over C2: isStartLockContention - held=false
    C2->>FS: waitForDaemonStartupTarget (polls manifest)

    C1->>D: spawnDetached (daemon)
    D->>FS: writes manifest (origin + auth token)
    D-->>C1: waitForDaemonStartupTarget - readyUrl
    C1->>FS: releaseDaemonStartLock()

    FS-->>C2: readReachableLocalServerHint() - manifest found
    Note over C2: ready = manifest.connection.origin

    C1->>FS: readActiveLocalServerManifest() - manifest
    C2->>FS: readActiveLocalServerManifest() - manifest

    C1->>D: runMcpHttpBridge (stdio - HTTP /mcp)
    C2->>D: runMcpHttpBridge (stdio - HTTP /mcp)

    Note over C1,D: JSON-RPC forwarded bidirectionally
    Note over C1: stdin close / SIGTERM - shutdown() - closeBoth()
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant C1 as executor mcp (client 1)
    participant C2 as executor mcp (client 2)
    participant FS as Filesystem (start-lock + manifest)
    participant D as Daemon process

    C1->>FS: readActiveLocalServerManifest()
    C2->>FS: readActiveLocalServerManifest()
    FS-->>C1: null (cold start)
    FS-->>C2: null (cold start)

    C1->>FS: "acquireDaemonStartLock() - held=true"
    C2->>FS: acquireDaemonStartLock() - contention error

    Note over C2: isStartLockContention - held=false
    C2->>FS: waitForDaemonStartupTarget (polls manifest)

    C1->>D: spawnDetached (daemon)
    D->>FS: writes manifest (origin + auth token)
    D-->>C1: waitForDaemonStartupTarget - readyUrl
    C1->>FS: releaseDaemonStartLock()

    FS-->>C2: readReachableLocalServerHint() - manifest found
    Note over C2: ready = manifest.connection.origin

    C1->>FS: readActiveLocalServerManifest() - manifest
    C2->>FS: readActiveLocalServerManifest() - manifest

    C1->>D: runMcpHttpBridge (stdio - HTTP /mcp)
    C2->>D: runMcpHttpBridge (stdio - HTTP /mcp)

    Note over C1,D: JSON-RPC forwarded bidirectionally
    Note over C1: stdin close / SIGTERM - shutdown() - closeBoth()
Loading

Reviews (2): Last reviewed commit: "feat(cli): make executor mcp ensure a du..." | Re-trigger Greptile

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
import { Effect } from "effect";
import { mkdtempSync, readdirSync, rmSync } from "node:fs";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 readFileSync is used in stopAutoSpawnedDaemon but is not present in the node:fs import. At runtime this throws a ReferenceError, which the surrounding try/catch silently swallows, so the auto-spawned daemon is never sent SIGTERM. Subsequent rmSync still removes the data directory, leaving an orphan daemon process behind — accumulating across repeated test runs.

Suggested change
import { mkdtempSync, readdirSync, rmSync } from "node:fs";
import { mkdtempSync, readdirSync, readFileSync, rmSync } from "node:fs";

@pkg-pr-new

pkg-pr-new Bot commented Jun 28, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1196

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1196

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1196

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1196

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1196

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1196

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1196

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1196

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1196

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1196

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1196

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1196

executor

npm i https://pkg.pr.new/executor@1196

commit: 37ff13e

executor mcp no longer starts a server in-process. It ensures a durable
detached daemon and bridges stdio JSON-RPC to that owner over HTTP.
Concurrent cold starts run a race-safe election: one process becomes the
owner and the rest wait for its manifest and attach instead of failing.
The owner's lifetime is independent of any MCP client, so many clients,
the web UI, and the desktop app share one local server.

This builds on the merged start-lock primitive so no client ever owns the
database, replacing the earlier approach where the first mcp process
started a server in-process and bridged to itself.

Adds a cold-start election probe across the cli VM targets plus a local
attach stress test; the one-winner, rest-attach behavior is verified on
macOS, Linux, and Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant